{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python 机器学习实战 ——代码样例\n",
    "\n",
    "# 第十四章 逻辑回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用逻辑回归进行二分类\n",
    "\n",
    "数据集介绍：乳腺癌数据集是一个经典并且简单的二分类数据集。一共有 569 个样本，其中 212 个样本为恶性 ( malignant， 0 )，357 个样本为良性 ( benign， 1 )。每个样本有 30 个特征，均为非负实数。30 个特征分为三类，前 10 个是相关指标的平均值，中间 10 个是指标的偏差，最后 10 个是指标的最差极值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "混淆矩阵： \n",
      " [[50  4]\n",
      " [ 2 87]]\n"
     ]
    }
   ],
   "source": [
    "# 导入需要的库。\n",
    "\n",
    "import numpy as np\n",
    "from sklearn import datasets\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# 导入数据并分为训练集和测试集。\n",
    "\n",
    "breast_cancer = datasets.load_breast_cancer()\n",
    "x = breast_cancer['data']\n",
    "y = breast_cancer['target']\n",
    "X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=42) \n",
    "\n",
    "# 逻辑回归拟合。\n",
    "\n",
    "log_reg = LogisticRegression()\n",
    "log_reg.fit(X_train, y_train)\n",
    "\n",
    "# 测试集效果检验，输出混淆矩阵。\n",
    "\n",
    "y_predict = log_reg.predict(X_test)\n",
    "print('混淆矩阵： \\n',confusion_matrix(y_test, y_predict))\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
